Learning Generic Invariances in Object Recognition: Translation and Scale
نویسندگان
چکیده
Invariance to various transformations is key to object recognition but existing definitions of invariance are somewhat confusing while discussions of invariance are often confused. In this report, we provide an operational definition of invariance by formally defining perceptual tasks as classification problems. The definition should be appropriate for physiology, psychophysics and computational modeling. For any specific object, invariance can be trivially “learned” by memorizing a sufficient number of example images of the transformed object. While our formal definition of invariance also covers such cases, this report focuses instead on invariance from very few images and mostly on invariances from one example. Image-plane invariances – such as translation, rotation and scaling – can be computed from a single image for any object. They are called generic since in principle they can be hardwired or learned (during development) for any object. In this perspective, we characterize the invariance range of a class of feedforward architectures for visual recognition that mimic the hierarchical organization of the ventral stream. We show that this class of models achieves essentially perfect translation and scaling invariance for novel images. In this architecture a new image is represented in terms of weights of ”templates” (e.g. “centers” or “basis functions”) at each level in the hierarchy. Such a representation inherits the invariance of each template, which is implemented through replication of the corresponding “simple” units across positions or scales and their “association” in a “complex” unit. We show simulations on real images that characterize the type and number of templates needed to support the invariant recognition of novel objects. We find that 1) the templates need not be visually similar to the target objects and that 2) a very small number of them is sufficient for good recognition. These somewhat surprising empirical results have intriguing implications for the learning of invariant recognition during the development of a biological organism, such as a human baby. In particular, we conjecture that invariance to translation and scale may be learned by the association – through temporal contiguity – of a small number of primal templates, that is patches extracted from the images of an object moving on the retina across positions and scales. The number of templates can later be augmented by bootstrapping mechanisms using the correspondence provided by the primal templates – without the need of temporal contiguity. This version replaces a preliminary CBCL paper which was cited as: Leibo et al. ”Invariant Recognition of Objects by Vision,” CBCL-291, November 2, 2010
منابع مشابه
Scale and Orientation Invariant Object Recognition using Self-Organizing Maps
This paper proposes a new invariant feature-space system based in the log-polar image representation and SelfOrganizing Maps (SOMs). The image representation used, which is inspired by the structure of the human retina, allows data reduction and helps recognize image contents at different scales and orientations. Each object class is represented by a single prototype image, thus allowing classe...
متن کاملNeural Classifiers for Learning Higher-Order Correlations
Studies by various authors suggest that higher-order networks can be more powerful and are biologically more plausible with respect to the more traditional multilayer networks. These architectures make explicit use of nonlinear interactions between input variables in the form of higher-order units or product units. If it is known a priori that the problem to be implemented possesses a given set...
متن کاملTiled convolutional neural networks
Convolutional neural networks (CNNs) have been successfully applied to many tasks such as digit and object recognition. Using convolutional (tied) weights significantly reduces the number of parameters that have to be learned, and also allows translational invariance to be hard-coded into the architecture. In this paper, we consider the problem of learning invariances, rather than relying on ha...
متن کاملDirect Modeling of Complex Invariances for Visual Object Features
View-invariant object representations created from feature pooling networks have been widely adopted in state-of-the-art visual recognition systems. Recently, the research community seeks to improve these view-invariant representations further by additional invariance and receptive field learning, or by taking on the challenge of processing massive amounts of learning data. In this paper we con...
متن کاملMagic Materials: a theory of deep hierarchical architectures for learning sensory representations
We propose that the main computational goal of the ventral stream is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable with respect to small perturbations, and discriminative for recognition, and that this representation may be continuously learned in an unsupervised way during development and natural visual experience. Invariant repres...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010